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Abstract 

A recurring theme in the least squares approach to phylogenet- 
ics has been the discovery of elegant combinatorial formulas for the 
least squares estimates of edge lengths. These formulas have proved 
useful for the development of efficient algorithms, and have also been 
important for understanding connections among popular phylogeny al- 
gorithms. For example, the selection criterion of the neighbor-joining 
algorithm is now understood in terms of the combinatorial formulas of 
Pauplin for estimating tree length. 

We highlight a phylogenetically desirable property that weighted 
least squares methods should satisfy, and provide a complete charac- 
terization of methods that satisfy the property. The necessary and 
sufficient condition is a multiplicative four point condition that the 
the variance matrix needs to satisfy. The proof is based on the obser- 
vation that the Lagrange multipliers in the proof of the Gauss-Markov 
theorem are tree- additive. Our results generalize and complete previ- 
ous work on ordinary least squares, balanced minimum evolution and 
the taxon weighted variance model. They also provide a time optimal 
algorithm for computation. 



1 Introduction 



The least squares approach to phylogenetics was first suggested by Cavalli- 
Sforza & Edwards [i| and Fitch & Margoliash [sj]. The precise problem 
formulated in [3] was Problem ll.lt 

Definition 1.1. (Pair-edge incidence matrix) Given a phylogenetic X- 
tree T with edge set E and \X\ — n (see fi3 / for basic definitions), the 
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pair-edge incidence matrix of T is the [Tj x \E\ matrix 

,q\ J 1 if e <E E is an edge on the path between i and j, 

{bT)ij,e - | Q otherwiset 

Definition 1.2. (Tree-additive map) Let T be a phylogenetic X-tree. A 
dissimilarity map D is T -additive if for some vector I E RJ^' ; 

Dij = (S T l)iy (1) 

Problem 1.1 (Ordinary least squares) Find the phylogenetic X-tree T 
and T -additive map D that minimizes 



E (a. 



Dijf. (2) 



For a fixed tree, the solution of Problem 11.11 is a linear algebra problem 
(Theorem II. 3. p . However Rzhetsky & Nei [2J] showed that the Ordinary 
Least Squares edge lengths could instead be computed using elegant and 

efficient combinatorial formulas. Their result was based on an observation 

i — n 

of Vach [23], namely that OLS edge lengths obey the desirable Independence 
of Irrelevant Pairs property (our choice of terminology is inspired by social 
choice theory |23|): 

Property 1.1 (IIP) Let T be a phylogenetic X-tree and e an edge in T. A 
linear edge length estimator for e is a linear function from dissimilarity maps 
to the real numbers, i.e. l e = YlijPij^ij- We say that such an estimator 
satisfies the IIP property if Pij = when the path from i to j in T (denoted 
i,j ) does not contain either of ' e's endpoints. 

In other words, the IIP property is equivalent to the statement that 
the sufficient statistic for the least squares estimator of the length of e is 
a projection of the dissimilarity map onto the coordinates given by pairs 
of leaves whose joining path contains at least one endpoint of e. It has 
been shown that this crucial property is satisfied not only by ordinary least 
squares (OLS) estimators, but also by specific instances of ^Weighted Least 
Squares estimators (e.g., |25[] ) . 

Problem 1.2 (Weighted least squares) Let T be a phylogenetic X-tree 
and D be a dissimilarity map. Find the T -additive map D that minimizes 

£ ^-(^-°«) 2 - < 3 > 
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The variance matrix for weighted least squares is the (2) x (2) diagonal 
matrix V whose diagonal entries are the Vij. Note that V can also be 
regarded as a dissimilarity map and we will do so in this paper. Weighted 
least squares for trees was first suggested in 0] and 14], with the former 
proposing specifically = Df-. 

Theorem 1.3. (Least squares solution) The solution to Problem \1.2\ is 

given by D — StI where 

I = {S t T V- 1 S T )~ 1 S t T V- 1 D. (4) 

We note that The OLS problem reduces to the case V = I. The statistical 
significance of the variance matrix together with a statistical interpretation 
of Theorem 11.3.1 is provided in Section 2. 

It follows from (|4]) that the lengths of the edges in a weighted least 
squares tree are linear combinations of the entries of the dissimilarity map. 
A natural question is therefore which variances matrices V result in edge 
length estimators that satisfy the IIP property? Our main result is an an- 
swer to this question in the form of a characterization (Theorem 13.4. p : a 
WLS model is IIP if and only if the variance matrix is semi-multiplicative. 
We show that such matrices are good approximations to the variances re- 
sulting from popular distance estimation procedures. Moreover, we provide 
combinatorial formulas that describe the WLS edge lengths under semi- 
multiplicative variances (Equation l20l) , and show that they lead to optimal 
algorithms for computing the lengths (Theorem 14.1.1) . 

The key idea that leads to our results is a connection between Lagrange 
multipliers arising in the proof of the Gauss-Markov theorem and the weak 
fundamental theorem of phylogenetics that provides a combinatorial charac- 
terization of tree-additive maps (Remark 12. 5. 1) . This explains many isolated 
results in the literature on least squares in phylogenetics; in fact, as we show 
in the section "The multiplicative model and other corollaries", almost all 
the known theorems and algorithms about least squares estimates of edge 
lengths follow from our results. 



2 BLUE Trees 

The foundation of least squares theory in statistics is the Gauss-Markov 
theorem. This theorem states that the Best Linear Unbiased Estimator 
for a linear combination of the edge lengths, when the errors have zero 
expectation, is a least squares estimator. We explain this theorem in the 
context of Problem 11.21 
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Lemma 2.1. For any phylogenetic X-tree T, the matrix St is full rank. 

Proof: We show that for any e G E, the vector f e = (0, . . . , 1, . . . , 0) of size 
\E\ with a 1 in the e-th position and elsewhere lies in the row span of S. 
Choose any z, j, k,l E X such that the paths from i to j and from k to / do 
not intersect, and the intersection of the paths from i to j and from k to / 
is exactly the edge e. Note that 

2 ^1 (Sik,e + 5j/,e — — Skl,e) = fe- (5) 

e 

Theorem 2.2. (Gauss— Markov Theorem) Suppose that D is a random 
dissimilarity map of the form D = StI + e where T is a tree, and e is a 
vector of random variables satisfying E(e) = and Var(e) = V where V is 
an invertible variance- covariance matrix for e. 

Let M(Sji) be the linear space generated by the columns of Sj, and f G 
M{Sj). Then j I — p l D (where I given by ^) has minimum variance 
among the linear unbiased estimators of fH. 

Proof: Observe that the problem of finding p is equivalent to solving a 
constrained optimization problem: 

min p t Vp subject to Sj-p = f. (6) 

The first condition specifies that the goal is to minimize the variance; the 
second constraint encodes the requirement that the estimator is unbiased. 
Using Lagrange multipliers, it is easy to see that the minimum variance 
unbiased estimator of f l l is the unique vector p satisfying 

Vp = St/^ for some /x G R,'^', (7) 

Sy = f. (8) 

In other words 

a t) (:) - G) 

(V^StU^S^V- 1 (u^s^v- 1 )^ fo\ 
- { -u-^s^v-' u-i )\f) 




where U = S^V^St- 
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The Gauss-Markov Theorem can also be proved directly using linear 
algebra, but the Lagrange multiplier proof has two advantages: First, it 
provides a description of p different from (|4j) that is simpler and more infor- 
mative. Secondly, the technique is general and can be used in many similar 
settings to find minimum variance unbiased estimators. Hayes and Haslett 
flB ] provide pedagogical arguments in favor of Lagrange multipliers for in- 
terpreting least squares coefficients and discuss the origins of this approach 
in applied statistics jl9f ] . 

In phylogenetics, Theorem 12.2.1 (and its proof) are useful because for 
each edge e, the vector f e in the standard basis for M(S^) is associated 
with a vector p such that p t D is the best linear unbiased estimator for the 
length of e. Similarly, the tree length is estimated from = (1,1,..., 1) 
which is also in M{S t T ). Condition ([7j) is particularly interesting because it 
says that there exists some T-additive map A = S^fi = Vp, whose (possibly 
negative) edge lengths are given by the Lagrange multipliers ji. 




Figure 1: The Lagrange tree A for an IIP weighted least squares estimator for 
the central edge e* of a complete binary tree with 8 leaves. In Proposition, 
[33] X = {A, B, C, D, E, F, G, H}, whereas in the proof of Theorem EX] the 
leaf labels represent clades. The IIP property means that the WLS estimate 
/ e * does not depend on D ab^cd^ef or Dqh- 

The following theorem provides a combinatorial characterization of tree- 
additive maps, and hence of the Lagrange tree A: 

Definition 2.3. (Weak four point condition) A dissimilarity map D sat- 
isfies the weak four point condition if for any i,j,k,l E X, two of the fol- 
lowing three linear forms are equal: 



Dij + Dku Dik + Dji, Du + Djk. 



(10) 
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Theorem 2.4. (Weak fundamental theorem of phylogenetics) A dis- 
similarity map D is tree-additive if and only if it satisfies the weak four point 
condition. 



Theorem 12.4.1 was first proved in [21[]. For a recent exposition see Corollary 
7.6.8 of [26( where it is derived using the theory of group- valued dissimilarity 
maps. We note that the pair of equal quantities in the four point condition 
define the topology of a quartet. Furthermore the topology of the tree is 
defined uniquely by the topologies of all its quartets. We again refer the 
reader to [26|] for details. 

The Lagrange equations (Cj) and ([ED together with Theorem 12.4.1 form the 
mathematical basis for our results: 

Remark 2.5. Condition (Qj specifies that Vp must be a T -additive map. It 
follows that Vp satisfies the weak four point condition. In other words, (Qj 
amounts to a combinatorial characterization ofVp, and hence p. Condition 
(EP imposes a normalization requirement on p. Together these conditions are 
useful for finding p, and also for understanding its combinatorial properties. 

The structure of the Lagrange tree in the case of OLS is the middle 
quartet of the tree shown in Figure 1. It immediately reveals interesting 
properties of the estimator. For example the fact that it is a tree on four 
taxa implies the IIP property. The content of [3, Appendix 2] is that for 
tree length estimation under the balanced minimum evolution model, the 
Lagrange tree is the star tree. In fact, we will see that most of the known 
combinatorial results about least squares estimates of edge and tree lengths 
can be explained by Remark 12.5.1 and interpreted in terms of the structure 
of the Lagrange tree. 



3 Main Theorem 

Our main result is a characterization of IIP WLS estimators. In the sections 
that follow we will see that the IIP property for WLS is not only biologically 
desirable, but also statistically motivated and algorithmically convenient. 
We begin by introducing some notation and concepts that are necessary for 
stating our main theorem. 

Definition 3.1. (Clade) A clade of a phylogenetic X-tree T is a subset 
A C X such that there exists an edge in T whose removal induces the parti- 
tion {A,X \ A}. We also use clade to mean the induced topology T\a- 
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Given a dissimilarity map D and a variance matrix V, we set 
Dab := V ab D oh, and 

aeA : beB 

Zab := £ ^ 

aeA,beB 

where A, B are disjoint clades. If ei, . . . , e& E £"(T) form a path with ends 
determining clades A and £>, then by the notation D ei ... ek and Z ei ... ek we 
mean and respectively. Note that if e is an edge in a tree T 

then (|7|8l) imply that the Lagrange tree for any WLS estimate of e satisfies 
A e = U 

Definition 3.2. (Semi-multiplicative map) A dissimilarity map D is 
semi-multiplicative with respect to disjoint clades A,B if for any a\,a2 G A 
and &i, &2 £ -B 

-D a ibi-D a2 b 2 — D ai fr 2 -D a 2 bi • (H) 

W^e say t/iat D zs semi-multiplicative with respect to T if for any pair of 
disjoint clades A,B, not defined by the same edge of T , (T/Ij) holds. 

Lemma 3.3. D is semi-multiplicative if and only if every clade AofT has 
the property that for any A! C A, and any clade B disjoint from A and 
induced by a different edge, for all x E B, 

Z{x}A'/Z{x}A = ^A'Ai 

where i% A does not depend on x. 

It is an easy exercise to prove that A satisfies (|T2]) for all relevant B if and 
only if (fT2l) holds for the the two clades disjoint from A and defined by the 
two edges adjacent to the edge defining A. 

The semi-multiplicative condition is slightly weaker than \ogD being 
tree-additive. Indeed, removing the requirement that the clades A, B are 
defined by different edges of T leaves one one with a multiplicative analog 
of the four-point condition. By Theorem 12.4. [ this is equivalent to — 

Ueejj^y 1 for s ° me w : e ( t ) R + Q 

Theorem 3.4. (Characterization of IIP WLS estimators) A WLS edge 
length estimator for an edge in a tree T has the IIP property if and only if 
the variance matrix is semi-multiplicative with respect to T. 



(12) 
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The proof of the theorem reduces to the WLS solution for the length of an 
edge in a tree with at most eight leaves (edge e* in Figure []]) : 

Proposition 3.5. Let T be the phylogenetic X-tree shown in FigureUi The 
Lagrange tree A — St ft for the WLS problem of estimating the length of the 
edge e* satisfies the property that n\ — —^2, M3 — — — — M6 and 
— —/is- Furthermore, these Lagrange multipliers and the remaining ones 
/i9, . . . , /ii3 can be computed by solving \i — (S^V -1 St) -1 f e * • 

Proof: Using the notation of Figure [U with the convention that the edge 
labeled by is e^, it follows from ([8]) that A e% = for i = 1,2,9. But 
A e . = A eiej +A eiek for {z, j, k} = {1, 2, 9}, which implies that A eiCj = Vi, j E 
{1, 2, 9}. Therefore V^Aab — V^(/ii + /i2) = and the result follows. The 
arguments for e3,e4, e^^e^ and e7,es are identical. The complete solution 
for the fi for a given V is given by fi = (S^T^ -1 *?) -1 / e *, which reduces to 
the inversion of a 13 x 13 matrix. 

Note that the proof only uses the fact that ei, e2 are adjacent leaf edges 
not adjacent to e*. The conclusion ji ei — — ji e<1 will hold identically in any 
tree for a pair of edges of this type. 

Proof of Theorem 13.4. 1 We begin by showing that if V is semi- multiplicative 
then the WLS edge length estimators have the IIP property. This calcula- 
tion involves showing that for any phylogenetic X-tree T and edge e* E T, 
the Lagrange tree for e* is the tree in Figure [H where A,B,C, D, E, F, G, H 
are clades with the property that their intra-clade Lagrange multipliers are 
zero. 

Let ei, . . . , efc, with k < 8, be the edges of T such that either d(e*, e^) = 2 
or d(e*, e^) < 2 and is a leaf edge. For i E {1, . . . , fc}, let Q be the clade 
defined by such that e* £ C{. Let T/ e * to be the phylogenetic X/ e *-tree, 
where X/ e * = {Ci, . . . , C^}, with topology induced by T in the natural way 
(see Figure [1]). Set W e * be the diagonal variance matrix on pairs of nodes 



satisfies the Lagrange equations for T. Thus fi are the Lagrange multipliers 
for / e * and l e * = A^^D. 



in X/ e * given by Vfc. = Z c ] c 



If \±l e * are the Lagrange multipliers and A^ e * is the Lagrange tree given 
by estimating l e * for topology T^ e * and variance V/ e * then the T-additive 
map given by A = S^/i with 




(13) 
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We let A/ 6 *, Z/ e * denote the natural correspondents of A and Z for the 
problem of estimating l e * from and W e * and T^ e * . It is an easy exercise 
to check that for all e G E(T' e *), we have Z e /e * = Z e and A^ e * = A e . This 
implies that A e = / e for all e G E(T' e *), i.e. the Lagrange equation fl3|) is 
satisfied for e G E(T^ e *). 

Now consider edge e G C\. We need to verify that A e = 0. Since A^- = 
for all z, j G Ci, A e = A e ... e2 + A e ... eg . Now for all i £ Ci and j G C2, 
Aij = /ii + fji2 = 0, so A e ... e2 = 0. Finally let C i be the clade defined 
by e and let A" be the clade defined by eg which does not intersect A. The 
fact that V is semi-multiplicative implies that for any taxon x G A" 

Z {x}A'/ Z {x}A = ^a'A ( 14 ) 

where £a'A does not depend on the taxon x. This implies A e ... eg = ^, 1 ^A ei ... eg 
by the proof of Proposition 13.5.1 

bmce /i e = for all e T/ e * , it is enough to show that A^ e * satisfies the 
IIP property. This follows from Proposition 13.5.1 Therefore, V has the IIP 
property with respect to T, i.e. A^- = for all i,j G X such that i,j does 
not intersect e*. 

This concludes the proof for the "if part of Theorem [3Aj For the "only 
if direction, we will prove by induction that (fT2l) is satisfied by all clades A 
of T, and thus the variance V is semi-multiplicative with respect to T. The 
base case is provided by clades formed by a single leaf, for which (fT2l) holds 
vacuously. 

For the induction step, suppose clades A and B both satisfy (fT2l) . and 
that they are defined by adjacent edges and e# (see Figure 2). Let ec be 
the other edge adjacent to and e# and let C = X \ [A U B) be the clade 
it defines. We would like to prove that the clade (A U B) also satisfies (jT2|) . 
If \C\ = 1, this holds vacuously. We may therefore assume that there exist 
two more edges ei, e2 incident with ec- Let Q C C be the clade denned by 
6i : for i = 1,2. It suffices to prove that (A U £>) satisfies (|T2|) with respect 
to Ci and C2. Notice that A and B already satisfy (fT2l) with respect to Ci 
and C2. Therefore it is enough to show that 

Z {x}A _ eg /i^ 



7r w . ^A(AUB) 

is the same for all x G Ci, and similarly for all x G C2. 

Now consider the problem of estimating l eA . Let /x be the corresponding 
Lagrange multipliers and A = St^ be the Lagrange tree they define. By the 
IIP property, A defines an identically zero tree additive map on the clade 
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Figure 2: Configuration of the induction in the proof that IIP WLS models 
are semi-multiplicative. 

C . Therefore the edge lengths corresponding to this map are all zero. This 
implies ji e — for all e E E(C), e ^ e\, e2, and also ji ei + fi e2 = 0. 

Let A\, . . . , Afc, with k < 4: and Si, . . . , St, with t < 2, be the sub- 
clades of A, respectively £>, corresponding to nodes of T/ eA . Then for any 
x E Ci and y £ Aj, and z G Bj, A xy = A^T^. does not depend on x, y and 

A xz = A^^. does not depend on x, z. 

Now pick x E Ci and let e be the leaf edge adjacent to it. Then A e = 0. 
Since all Lagrange multipliers are inside the clade Ci, A e = A e ... ei = 
A e ... e2 + A e ... Cc . Since /x ei + /x e2 = 0, A e ... e2 = 0. Thus A e ... eo = A {x}A + 
AjxjB — 0- Equivalently, 

E Z WA< + E A &, = * 

i=l j=l 

k t 

Z {*},a E ZaU A cX + z W,b E = (16) 

i=l j=l 

This imposes a linear equation on Z^ x y A and Z^ B whose coefficients do 
not depend on x. Thus the following also does not depend on x: 

Ci Z {x}A Z {x}A . , 
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4 An optimal algorithm for WLS edge lengths 

Theorem 4.1. (Computing WLS edge lengths) Let D be a dissimilar- 
ity map and V an IIP variance matrix. The set of all WLS edge lengths 
estimates for a tree T can be computed in 0{n 2 ) where n is the number of 
leaves in T . 



Figure 3: Configuration of the dynamic programming recursion for comput- 
ing WLS edge lengths. A, B and AU B are clades, and C is a clade disjoint 
from A U B. The oval in the middle represents the rest of the tree. 

Proof: It is apparent from the proof of Theorem 13.4.1 that all one needs 
in order to compute the WLS edge lengths are the values of Dab and Zab, 
where A and B are disjoint clades of T. We define the height of a tree to be 
the distance between its root and its farthest leaf, where the root is taken 
to be the closest endpoint of the edge defining the clade. Thus the height 
of a clade formed by just one leaf is 0. 

Now consider the configuration in Figure O The clades A,B,C are all 
pairwise disjoint and A and B are adjacent. It is easy to see that AU B 
form a clade for which 



Therefore one needs only constant time to compute Daub,c and Zaub,C 
if Dac,Zac,Dcb and Zqb are known. Clearly, there are 0(n) clades since 
there are 0(n) edges, and thus there are 0{n 2 ) pairs of disjoint clades. We 
can compute Dab and Zab for all pairs AB through a simple dynamic 
program. We start with pairs of trees of height 0, for which the values of D 
and Z are trivially given by 5 and V~ x . After round 2t of the algorithm we 
will know Dab and Zab for all disjoint pairs A, B of height at most t and 




Zaub,c 
Daub,c 



Zac + Z B c, 

(DacZac + DbcZbc)/Zaub,c- 



(18) 
(19) 
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after round 2t+ 1 we know Dab and Zab for all disjoint pairs A, B of height 
t + 1 and £ respectively. The algorithm clearly requires constant time per 
clade pair. Subsequently, all 0{n) edge lengths can be computed in constant 
time per edge: the calculation of each edge length involves only a constant 
number of multiplications and one matrix inversion (of size at most 13 x 13). 
Thus the algorithm is optimal since its running time is proportional to the 
size of the input. 

We note that many algorithms have been proposed for computing WLS 
edge lengths for certain specific models (these are discussed in the next 
section). Existing approaches rely on different recursive schemes that lead 
to markedly different algorithms. Some attempt to reduce the size of the 
problem by agglomerating leaves ([4]); others start with a star topology 
and gradually extend it by refining internal nodes ([23]). In fact, all these 
methods implicitly compute Lagrange multipliers in a recursive way, and 
dealing directly with Lagrange multipliers may in many cases clarify the 
exposition and suggest simplified implementations. As we can see from 
the above theorem however, once one has the closed form expressions for 
the edge lengths, these inductive arguments can be easily replaced by our 
dynamic program. 

5 The multiplicative model and other corollaries 

In this section we begin by giving formulas for the WLS edge lengths assum- 
ing a a tree-multiplicative variance matrix, i.e. Vij = Yl^ijw^ 1 for some 
w : E(T) — > R + . Throughout the section, e* E E{T) denotes the edge for 
which the WLS length is being computed. If e* is an internal edge then 
A, £>, C, D are the adjacent clades. In the case that e* is adjacent to a leaf, 
that leaf is labeled i and the adjacent clades A, B. 

Proposition 5.1. If V is a tree-multiplicative variance matrix then the 
WLS edge length of an internal edge is 



2/ e * — 



+ 



Zap + Zcb 
Zaub,cud 

Zac + Zdb 
Zaub,cud 

Dab - Dqd 



(Dac + Dbd) 



(Dad + Dec) 



(20) 



// e* is adjacent to a leaf then the WLS length is 



2/ e * = Dai + D B i - Dab- 



(21) 
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At first glance these formulas may seem surprising, but the derivation is 
straightforward after solving for the Lagrange multipliers. 

Proof: By the results of the previous section, it is enough to verify 
that the Lagrange equations hold. By Proposition 13.5.1 this is equivalent to 
verifying that the Lagrange equations hold for T^ e * and W 6 *, which is a 
simple exercise left to the reader. 

We now present a number of previous results about least squares that can 
be interpreted (and in some cases completed) using Theorems 13.4.1 147T1 and 
Lemma l5.1.1 All the models we discuss are special cases of the multiplicative 
variance model and all of our statements can be easily proven by substituting 
the appropriate form of V into (1201211) . 

Ordinary least squares. 

This is the first model considered for least squares phylogenetics, and is the 
most studied model for edge and tree length estimation. It corresponds to 
the variance matrix equal to the identity matrix. 



Corollary 5.2. (Rzhetsky [241 ]) The ordinary least squares estimate p t D - 
/^(S^Sr) -1 Sj^D for the length of edge e is given by 

of n A n D + n B n c ( . 

2/ e* = 7 ; w ; r (D AC + D BD ) 

{n A + n B ){nc + n D ) 

n A n c + n B n D 

+ 7 , \7 , \ \ Dad + Dbc ) 

[n A + n B )(nc + n D ) 

- D AB — Dcdi (22) 

where n A , n B , nc and no are the number of leaves in the clades A, £>, C and 
D, and D A c — ^2 a eA ceC -^ac- If is a leaf edge, l e is given by: 

2/ e * =D Ai + D Bi -D AB . (23) 

Our algorithm for computing edge lengths (Theorem 14.1.1) reduces, in the 
case of OLS, to that of 0|. It has the same optimal running time as the 
algorithms in [H, E3, 27], 



Balanced minimum evolution. 



The Balanced Minimum Evolution model was introduced by Pauplin in 
[22| . The motivation was that in the computation of l e * in the OLS model, 
the distances D ac and D^ can receive different weights than D a d and D^ 
where a£A, 6gS,cgC and d E D. Pauplin therefore suggested an 
alternative model where all clades are weighted equally. 
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Corollary 5.3. (Pauplin's edge formula) The WLS edge lengths with 
variance model Vij oc 2^' J I are given by l e * = ^(Dac + ^bd + Dad + 
Dbc)-\(D A b-D C d) for internal edges andl e * = \(D A i + D B i)-\(DAB) 
for edges adjacent to leaves. 

Proof: This corresponds to the multiplicative variance model with w e = 0.5 
for all edges e. One can easily show that in this case Zab ^ 2~\ A ' B \ and the 
result follows trivially from Theorem 13. 4. 1 

As far as we are aware, this is the first proof that the formulas given 
by Pauplin for edge lengths are in fact the WLS edge weights under the 
variance model described above. This implies: 

Remark 5.4. The edge weights of the neighbor- joining tree obtained from 
the standard reduction formula are equal to the weighted least squares edge 
length estimates under the BME model. 

This result is a companion to the the connection between Pauplin's tree 
length formula and WLS tree length under the BME model that was estab- 
lished by Desper and Gasquel in [5]. They proved the following: 

Corollary 5.5. (Desper and Gascuel [ij]) The tree length estimator given 
by I = J2 a b D a b^~ Vij is the minimum variance tree length estimator for 
the BME model. It is also identical to the one given by the coefficients 

Proof: The second part of the corollary follows trivially from Theorem 
12. 2. [ The first part follows from a simple combinatorial argument by adding 
up the WLS edge lengths. Alternatively one can notice directly that since 
p a b = 2 1_p ^', it follows that p a bVab is the uniform vector, and thus defines a 
T-additive map, corresponding to the star topology (equal-length leaf edges 
and zero-length internal edges). Finally, J2i j $ij,eP — 1 follows from an easy 
counting argument. Further elaboration on Remark [5.4. 1 is beyond the scope 
of this paper. 

The taxon-weighted variance model. 

Another well known WLS model was introduced by Denis and Gascuel in 
Under this model we set = trfj for some £i, . . . , t n G R + . In the tree- 
multiplicative model, this corresponds to setting w e = 1 for internal edges 
and w e = ti when e is the leaf edge adjacent to leaf i. The paper [4] gives 
a beautiful proof for the statistical consistency of this model (which implies 
statistical consistency of OLS), and also provides an 0{n 2 ) algorithm for 
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computing the WLS edge lengths. However, the algorithm is based on a 
recursive agglomeration scheme and an explicit formula for the edge lengths 
based on the values of D is not given. Such a formula follows from Theorem 

Corollary 5.6. For e an internal edge of T, the WLS edge length l e * is 
given by 

of TaTd + T ° Tb ( , n ^ 

2/e * = (t a + t b )(t c t d )( Dac + Dac) 

T A T C + T D T B ( , 

- (D AB + D CD ) (24) 

where T x = T^xex^ and d xy = J2 xe x, y eY T^ D *y- U e * is ad J acent to 
a leaf, 

2l e *=D Ai + D Bi -D AB . (25) 



6 Final remarks 

An important question is whether the variance matrices required for the 
IIP property to hold are realistic for problems where branch lengths are 
estimated using standard evolutionary models. In fact, semi-multiplicative 
matrices do not exactly capture the desired form of the variance, but they 
are good approximations. We illustrate this for the Jukes-Cantor model 



121= 



Proposition 6.1. (Variance of distance estimates 20]) Let the ran- 
dom variable Y be the fraction of different nucleotides between two sequences 
of length n that are generated from the Jukes-Cantor process with branch 
length 5. Then the expected value of the empirical distance D = — | log (l — |y) 
is S and its variance is 

Var(D) « — Uei 5 + 2e^ s - 3^ . (26) 
16n V / 

This result can be extended to more general models. Since the branch 
lengths for an evolutionary model are tree-additive, this shows that for many 
regimes of the parameter 5, a tree- multiplicative model for variances is very 
reasonable. For a discussion on the statistics rationale behind least squares 
see 
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Unfortunately, the Fitch-Margoliash assumption that the variance Vij = 
V&Y(Dij) oc Dfj is inaccurate in light of ([261) , nor does it lead to IIP estimates 
since V is not semi-multiplicative. This means that for generic dissimilar- 
ity maps, the Fitch-Margoliash least squares estimates of edge lengths will 
depend on irrelevant distance estimates. 

Another point that is important is that although it follows from Theorem 
12.2.1 that for any V and / there is a unique BLUE p for / t /, the converse of 
this statement is not true. For example, if p is BLUE for f f l with variance 
matrix V, then p is BLUE for fH with variance matrix kV where k > 0. 
This is obvious because S^p remains the same, and kVp is a T-additive map 
if Vp is a T-additive map. However this point has more subtle (and serious) 
consequences: 

Proposition 6.2. (Non-uniqueness of tree length) The WLS estimated 
tree length with V — (ci +C2(|i,j| — l))2l z,J l does not depend on the constants 
c\ and C2 . 

Proposition 16.2.1 has significance for the interpretation of the neighbor- 
joining algorithm. Based on [H], in [l2| it is shown that neighbor- joining 
minimizes the balanced evolution criterion at each step. The criterion is 
argued to be statistically relevant by virtue of the fact_that it is the BLUE 
for the tree length under the assumption that Vij oc 2^' J L Proposition 16.2.1 
shows that there are many (significantly) different variance assumptions that 
yield the same tree length estimate. In fact, for some tree topologies, it is 
even possible that the OLS tree length is equal to the BME WLS tree length 
(for example for 5 taxa trees). This means that by minimizing the tree length 
some information about the variance is being discarded, and from this point 
of view the fact that the balanced minimum evolution criterion is equal to 
the BLUE tree length for multiple variance assumptions can be seen as a 
weakness of balanced minimum evolution methods, not a strength. 

There are other issues that are important in least squares applications 
in phylogenetics that we have not mentioned in this paper. One obvious 
difficulty with applying WLS methods to tree length estimation is that the 
resulting estimators are tree-additive, and not necessarily tree-metrics. That 
is, there may be edge length estimates that are negative. A number of 
strate gies for solving the non-negative WLS problem have been proposed 



Our optimal algorithm for weighted least squares edge length estimates 
for multiplicative matrices is similar in spirit to a some of the algorithms 
in In fact, we believe that all the fast algorithms for WLS edge lengths 
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can be understood within a single framework. The unifying concept is the 
observation that they all essentially estimate the Lagrange tree, either via 
a top-down, or bottom-up approach. We defer a detailed discussion of this 
to another paper. Finally, a key issue is that of consistency for specific 
forms of variance matrices assigned to all trees 0, S] • An obvious question 
is what classes of semi-multiplicative variance matrices result in consistent 
tree estimates. A full discussion of this topic is also beyond the scope of this 
paper. 
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